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Abstract 

Background: Cluster randomised controlled trials (CRCTs) are frequently used in health service evaluation. 
Assuming an average cluster size, required sample sizes are readily computed for both binary and continuous 
outcomes, by estimating a design effect or inflation factor. However, where the number of clusters are fixed in 
advance, but where it is possible to increase the number of individuals within each cluster, as is frequently the 
case in health service evaluation, sample size formulae have been less well studied. 

Methods: We systematically outline sample size formulae (including required number of randomisation units, 
detectable difference and power) for CRCTs with a fixed number of clusters, to provide a concise summary for 
both binary and continuous outcomes. Extensions to the case of unequal cluster sizes are provided. 

Results: For trials with a fixed number of equal sized clusters (k), the trial will be feasible provided the number of 
clusters is greater than the product of the number of individuals required under individual randomisation (n,) and 
the estimated intra-cluster correlation (p). So, a simple rule is that the number of clusters (k) will be sufficient 
provided: 

k > ni x p. 

Where this is not the case, investigators can determine the maximum available power to detect the pre-specified 
difference, or the minimum detectable difference under the pre-specified value for power. 

Conclusions: Designing a CRCT with a fixed number of clusters might mean that the study will not be feasible, 
leading to the notion of a minimum detectable difference (or a maximum achievable power), irrespective of how 
many individuals are included within each cluster. 



Introduction 

Cluster randomised controlled trials (CRCTs), in which 
clusters of individuals are randomised to intervention 
groups, are frequently used in the evaluation of service 
delivery interventions, primarily to avoid contamination 
but also for logistic and economic reasons [1-3]. Whilst 
a well conducted individually Randomised Controlled 
Trial (RCT) is the gold standard for assessing the effec- 
tiveness of pharmacological treatments, the evaluation 
of many health care service delivery interventions is dif- 
ficult or impossible without recourse to cluster trials. 
Standard sample size formulae for CRCTs require the 
investigator to pre-specify an average cluster size, to 
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determine the number of clusters required. In so doing, 
these sample size formulae implicitly assume that the 
number of clusters can be increased as required [1,3-5]. 

However, when evaluating health care service delivery 
interventions the number of clusters might be limited to 
a fixed number even though the sample size within each 
cluster can be increased. In a real example, evaluating 
lay pregnancy support workers, clusters consisted of 
groups of pregnant women under the care of different 
midwifery teams [6,7]. The available number of clusters 
was restricted to the midwifery teams within a particular 
geographical region. Yet within each midwifery team it 
was possible to recruit any reasonable number of indivi- 
duals by extending the recruitment period. In another 
real example, a CRCT to evaluate the effectiveness of a 
combined polypill (statin, aspirin and blood pressure 
lowering drugs) in Iran was limited to a fixed number of 
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villages participating in an existing cohort study [8]. 
Other such examples of designs in which a limited num- 
ber of clusters were available include trials of commu- 
nity based diabetes educational programs [9] and 
general practice based interventions to reduce primary 
care prescribing errors [10], both of which were limited 
to the number of general practices which agreed to 
participate. 

The existing literature on sample size formulae for 
CRCTs focuses largely on the case where there is no 
limit on the number of available clusters [3-5,11,12]. 
Whilst it is well known that the statistical power that can 
be achieved by additional recruitment within clusters is 
limited, and that this depends on the intra-cluster corre- 
lation [11-13], little attention has been paid to the limita- 
tions imposed when the number of clusters is fixed in 
advance. This paper aims to fill this gap by exploring the 
range of effect sizes, and differences between proportions, 
that can be detected when the number of clusters is fixed. 
We describe a simple check to determine whether it is 
feasible to detect a specified effect size (or difference 
between proportions) when the number of clusters are 
fixed in advance; and for those cases in which it is infeasi- 
ble, we determine the minimum detectable difference 
possible under the required power and the maximum 
achievable power to detect the required difference. We 
illustrate these ideas by considering the design of a 
CRCT to detect an increase in breastfeeding rates where 
the number of clusters are fixed. 

For completeness we outline formulae for simpler 
designs for which the sample size formulae are relatively 
well known, or easily derived, as an important prelude. 
In so doing, the simple relationships between the formu- 
lae are clear and this allows progressive development to 
the less simple situation (that of binary detectable differ- 
ence or power). It is hoped that by developing the for- 
mulae in this way the material will be accessible to 
applied statisticians and more mathematically minded 
health care researchers. We also provide a set of guide- 
lines useful for investigators when designing trials of 
this nature. 

Background 

Generally, suppose a trial is to be designed to test the 
null hypothesis H 0 ■ [to = fii where fio and \i\ represent 
the means of some variable in the control and interven- 
tion arms respectively; and where it is assumed that var 
(^o) = y ar(/<i) = cr 2 . Suppose further that there are an 
equal number of individuals to be randomised to both 
arms, letting n denote the number of individuals per 
arm and letting d denote the difference to be detected 
such that d = fio - fii, 1 - fi denotes the power and a 
the significance level. We limit our consideration to 
trials with two equal sized parallel arms, with common 



standard deviation, two-sided test, and assume normality 
of outcomes and approximate the variance of the differ- 
ence of two proportions. The sub-script, I (for Indivi- 
dual randomisation), is used throughout to highlight any 
quantities which are specific to individual randomisa- 
tion; and likewise the sub-script, C (for Cluster rando- 
misation), is used throughout to highlight any quantities 
which are specific to cluster randomisation. No sub- 
scripts are used to distinguish cluster from individual 
randomisation for variables which are pre-specified by 
the user. 

RCT: sample size formulae under individual 
randomisation 

Following standard formulae, for a trial using individual 
randomisation[14], for fixed power (1 - f}) and fixed 
sample size («) per arm, the detectable difference, d It 
with variance var(<f/) = 2a 2 /«/ is: 



2a 1 



-{Za/2 



(D 



where z a / 2 denotes the upper 100a/2 standard normal 
centile. 

For a trial with n individuals per arm, the power to 
detect a pre-specified difference of d, is 1 - /3j, such that: 



jn d 
V 2 o 

or equivalently: 
Pi 



, pn d 

2 cr 



Za/2 



(2) 



where <J> is the cumulative standardised Normal 
distribution. 

And, finally the required sample size per arm for a 
trial at pre-specified power 1 - /3 to detect a pre-speci- 
fied difference of d, is n h where: 



Hi = 2a 2 



(z a / 2 +Z/)) 2 
I 2 



(3) 



Using Normal approximations, the above formulae can 
be used for binary outcomes, by approximating the var- 
iance (cr 2 ) of the proportions n\ and n 2 , by: 



cr 2 « l/2[jz 1 (l - m) + n 2 {l - 7T 2 )} 
for testing the two sided hypothesis H 0 : n 1 = n 2 . 



(4) 



CRCTs: standard sample size formulae under cluster 
randomisation 

Suppose, instead of randomising over individuals, the 
trial will randomise the intervention over k clusters per 
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arm each of size m, to provide a total of n c = mk indivi- 
duals per arm. Then, by standard results [1], the var- 
iance of the difference to be detected d c is inflated by 
the Variance Inflation Factor (VIF): 



VIF = [1 + (m- \)p) 



(5) 



where p is the Intra-Cluster Correlation (ICC) coeffi- 
cient, which represents how strongly individuals within 
clusters are related to each other. Where the cluster 
sizes are unequal this variance inflation factor can be 
approximated by: 



VZF= [1 + {{cv 2 + l)m- l)p] 



(6) 



where cv represents the coefficient of variation of the 
cluster sizes and fh is the average cluster size [15]. Thus, 
the variance of d c (for fixed cluster sizes) becomes: 



var(dc) 



2cr 2 [l + (m- l)p] 
mk 



(7) 



and this is simply extended for varying cluster sizes 
using equation 6. To determine the required sample size 
for a CRCT with a pre-specified power 1 - /J, to detect 
the pre-specified difference d, and where there are m 
individuals within each cluster, then the required sample 
size «c = km per arm, follows straightforwardly from 
equations 3 and 5 and is: 



n c 



2a' 



(Za/2 +Zfj) 2 [l + (m - l)p] 



d 2 



= ri[[l + (m — l)p) 
= tii x VIF 



(8) 



where rij is the required sample size per arm using a 
trial with individual randomization to detect a difference 
d, and VIF can be modified to allow for variation in 
cluster sizes (equation 6). This is the standard result, 
that the required sample size for a CRCT is that 
required under individual randomisation, inflated by the 
variance inflation factor [1]. The number of clusters 
required per arm is then: 



fe= r "j(l + - l)p) 



(9) 



assuming equal cluster sizes. This slight modification 
of the common formula for the number of required 
clusters (over that say presented in [2]), has rounded up 
the total sample size to a multiple of the cluster size 
(using the ceiling function). For, unequal cluster sizes 
(using the VIF at equation 6) this becomes: 



m[l + {{cv 2 + l)m- l)p] 

fe = r = 1 



(10) 



again with rounding up to the average cluster size. 

CRCTs of fixed size: fixed number of clusters each 
of fixed size 

Where a CRCT is to be designed with a completely 
fixed size, that is with a fixed number of clusters, each 
of a fixed size (although this size may vary between 
clusters), then it is possible to evaluate both the detect- 
able difference and the power, as would be the case in a 
design using individual randomisation. CRCTs of fixed 
size might not be the commonest of designs, but formu- 
lae presented below: are an important prelude to later 
formulae, might be useful for retrospectively computing 
power once a trial has commenced (and thus the size 
has been determined), and will also be useful in those 
limited number of studies for which the trial sample 
size is indeed completely fixed (for example within a 
cohort study) [9,10]. 

CRCT of fixed size: detectable difference 

For a CRCT with a fixed number of clusters k per arm, 
with a fixed number of individuals per cluster m and 
with power 1 -/?, then the detectable difference, d c , fol- 
lows straightforwardly from equation 1: 



2<r 2 [l + {m- l)p] 
mk 

= diy/\l + {m- Tjp] 
= d,WlF 



{Za/2 +Zp) 



(ID 



where d I is the detectable difference using individual 
randomisation and VIF might be either of those pre- 
sented at equations 5 and 6. So the detectable difference 
in a CRCT can be thought of as the detectable differ- 
ence in a trial using individual randomisation, inflated 
by the square-root of the variance inflation factor. 

CRCTs of fixed size: power 

The power 1 - /J c of a trial designed to detect a differ- 
ence of d with fixed sample size n c = mk per arm, fol- 
lowing equation 2, is: 



2 crVVZF 
or equivalently, that: 



p c = 4> 



I mk 



Za/2 



(12) 



m 



where again, VIF might be either of those presented at 
equations 5 and 6. So, power in a CRCT can be thought 
of as the power available under individual randomisation 
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for a standardised effect size which is deflated by the 
square-root of the variance inflation factor. 

CRCTs with fixed number of clusters but flexible 
cluster size 

Standard sample size formulae for CRCTs, by assuming 
knowledge of the cluster size (m) and determining the 
required number of clusters (k), implicitly assume that 
the number of clusters can be increased as required. 
However, in the design of health service interventions, it 
is often the case that the number of clusters will be lim- 
ited by the number of cluster units willing or able to 
participate. So for example, in two general practice 
based CRCTs (one to evaluate lay education in diabetes 
and the other to evaluate a general practice-based inter- 
vention to reduce primary care prescribing errors), the 
number of clusters was limited to the number of pri- 
mary care practices that agreed to participate in the 
study. From an estimate of the number of clusters avail- 
able, it is relatively straightforward to determine the 
required cluster size for each of the clusters. However, 
due to the limited increase in precision available by 
increasing cluster sizes, it might not always be feasible 
to detect the required difference at required power 
under a design with a fixed number of k clusters. These 
issues are explored below. 

CRCTs with a fixed number of clusters: sample size per 
cluster 

The standard sample size formulae for CRCTs assumes 
knowledge of cluster size (m) and consequently deter- 
mines the number of clusters (k) required. For a pre- 
specified available number of clusters {k), investigators 
need instead to determine the required cluster size (m). 
Whilst this sample size formula is not commonly pre- 
sented in the literature, it consists of a simple re- 
arrangement of the above formulae presented at equa- 
tion 8 [2]. So, for a trial with a fixed number of equal 
sized clusters (/c) the required sample size per arm for a 
trial with pre-specified power 1 - fi, to detect a differ- 
ence of d, is n c , such that: 



tic = ni[l + (m — 

- n '[ l <T- l ) p ] 

_ n f fe[l - p] 
[k - nip] 



(13) 



where «/ is the sample size required under individual 
randomisation. This increase in sample size, over that 
required under individual randomisation, is no longer a 
simple inflation, as the inflation required is now depen- 
dent on the sample size required under individual 
randomisation. 



The corresponding number of individuals in each of 
the k equally sized clusters is: 



nj(l - p) 
(fe - nip) 



(14) 



this time rounding up the total sample size to a multi- 
ple of the number of clusters (/c) available (using the 
ceiling function). 

For unequal cluster sizes, using the VIF from equation 
6, the required sample size is: 



njfe[l - p] 
[k — tii{cv 2 + l)p] 



(15) 



and the average number of individuals per cluster 
becomes: 



- r "/(I-P) , 

m = \t 7-3 — tt! 

fe — ni\cv L + 1) 



(16) 



again rounding up to a multiple of the number of 
clusters (/c) available. 

CRCT with a fixed number of clusters: feasibility check 

When designing a CRCT with a fixed number of clus- 
ters, because of the diminishing returns that sets in 
when the sample size of each cluster is increased, it may 
not be possible to detect the required difference at pre- 
specified power [2]. In a CRCT with a fixed number of 
individuals per cluster, but no limit on the number of 
clusters, no such limit will exist. This limit on the differ- 
ence detectable (or alternatively available power) stems 
from the maximum precision available within a CRCT 
with a limited number of clusters. Recall that the preci- 
sion of the estimate of the difference is: 



var l {d c ) 



mk 



mk 



2a 2 VIF 2a 2 \l + (m - l)p] 



(17) 



As the cluster size (m) becomes large, this precision 
reaches a theoretical limit: 



lim var 1 (dc) = lim 



mk 



2cr 2 [l + {m- l)p] 2a 2 p 



(18) 



This limit therefore provides an upper bound on the 
precision of an estimate from a CRCT. If the CRCT is 
to achieve the same or greater power as a corresponding 
individually randomised design, it is required that: 



k 



ni 



2a 2 p 2o+ 



(19) 
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for equal cluster sizes; and: 



Re-arranging this as a function of n 2 is: 



k 



2o 2 p(cv 2 + 1) 2a 2 



(20) 



for unequal cluster sizes. A simple feasibility check, to 
determine whether a fixed number of available clusters 
will enable a trial to detect a required difference at 
required power, therefore consists of evaluating whether 
the following inequality holds: 



k > nip 

for equal cluster sizes [2], and 
k > nip(c\p- + 1) 



(21) 



(22) 



for unequal cluster sizes. Here, rij is the required sam- 
ple size under individual randomisation, k is the avail- 
able number of clusters, p is the estimated intra-cluster 
correlation coefficient, and cv represents the coefficient 
of variation of cluster sizes. When this inequality does 
not hold, it will be necessary to re-evaluate the specifica- 
tions of this sample size calculation. This might consist 
of a re-evaluation of the power and significance level of 
the trial, or it might consist of a re-evaluation of the 
detectable difference. Bounds, imposed as a result of the 
limited precision, on the detectable difference and 
power are derived below. 

CRCT with a fixed number of clusters: minimum 
detectable difference 

For a trial with a fixed number of clusters {k), and 
power 1 - /3, the theoretical Minimum Detectable Differ- 
ence (MDD) for an infinite cluster size is d MDD , where: 



■IMDD 



lim„ 



2er 2 [l + (m- l)p] 



ink 



(z a / 2 +Zp) 



2a 2 p 



(23) 



[Za/2 +Zp) 



which follows naturally from the formula for detect- 
able difference (equation 1) and the bound on precision 
(equation 18). This therefore gives a bound on the 
detectable difference achievable in a trial with a fixed 
number of clusters. 

For the case of two binary outcomes, where Jii is fixed 
(and ji 2 > Tii), then the minimum detectable difference 
for a fixed number of clusters per arm k, is d c = n 2 - tc 1 
such that: 



p{z a /2 +Zp) 2 [jTl(l - JTl) + JT 2 (1 ~ JT 2 )]. 

k 



(24) 



0 = an 2 + bn2 + c 



(25) 



where a = - (1 + w), b = 2n l + w, 
c = wtz\{ \ — izi) — jip and w = p(z a/2 + Zp) 2 /k. Solving 
this quadratic gives: 



JT 2 = 



-b ± vT*? 2 - 4ac) 
2a 



(26) 



Each of these two solutions to this quadratic will pro- 
vide the limit on n 2 for two sided tests. 

CRCT with a fixed number of clusters: maximum 
achievable power 

For a trial again with a fixed number of clusters (k), the 
theoretical Maximum Achievable Power (MAP) to 
detect a difference d is 1 - j3 map where: 



km 



lim 

m-^oo \j 2(1 + (m — ljp) a 



■ Z a /2 



d 

Za/2 




(27) 




which again follows from the formula for power 
(equation 2) and the bound on precision (equation 18). 
So the maximum achievable power is 1 - Pmap where: 



(28) 



This therefore provides an upper limit on the power 
available under a design with a fixed number of clus- 
ters k. 

CRCT with a fixed number of clusters: practical advice 

When designing a CRCT with a fixed number of clus- 
ters, researchers should be aware that such trials will 
have a limited available power, even when it is possible 
to increase the number of individuals per cluster. In 
such circumstances, it will be necessary to: 

(a) Determine the required number of individuals 
per arm in a trial using individual randomisation 

(«/)• 

(b) Determine whether a sufficient number of clus- 
ters are available. For equal sized clusters, this will 
occur when: 

k > nip 
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where «/ is the sample size required under individual 
randomisation, p is the intra-cluster correlation coef- 
ficient, and k is the number of clusters available in 
each arm. For unequal sized clusters: 

k > riip[cv 2 + 1) 

where cv is the coefficient of variation of cluster 
sizes. 

(c) Where the design is not feasible and cluster sizes 
are unequal, determine whether the design becomes 
feasible with equal cluster sizes (i.e. if k > nfi). 

(d) Where the design is still not feasible: 

(i) Either: the power must be reset at a value 
lower than the maximum available power (equa- 
tion 28), 

(ii) Or: the detectable difference must be set 
greater than the minimum detectable difference 
(equations 23 (continuous outcomes) and 26 
(binary outcomes)), 

(iii) Or: both power and detectable difference are 
adjusted in combination. 

(e) Once a feasible design is found, determine the 
required number of individuals per cluster from 
equations 14 (for equal cluster sizes) and 16 (for 
varying cluster sizes). 

General examples 

Maximum achievable power for cluster designs with 10, 
20, 30, 50 or 100 clusters per arm are presented in Fig- 
ure 1 for standardised effect sizes ranging from 0.05 to 
0.30 and for ICCs in the range 0 to 0.1 (which are com- 
mon ICCs in the medical literature [16]). As expected, 
achievable power increases with increasing numbers of 
clusters and increasing effect size. For the smallest effect 
size considered, 0.05, even 100 clusters per arm is not 
sufficient to obtain anywhere near an acceptable power 
level for ICCs above about 0.02. For less extreme effect 
sizes, such as 0.2 when there are 50 or 100 clusters 
available per arm, for ICCs less than about 0.1 power in 
the level of 80% will be obtainable; yet where there are 
just 10 or 20 clusters available, 80% power will only be 
attainable for ICCs less than about 0.06. Figure 2 shows 
similar estimates of maximum achievable power for bin- 
ary comparisons at baseline proportions ranging from 
0.05 to 0.5 to detect increases of 0.1 (i.e. 10 percentage 
points on a percentage scale). 

Minimal detectable differences are also presented for 
both standardised effect sizes (Figure 3) and proportions 
(Figure 4) for 80% power. As expected, increasing the 
number of clusters reduces the minimum detectable dif- 
ference. Therefore with a large number of clusters avail- 
able and sufficient numbers of individuals per cluster, 



trials are possible to detect small changes in proportions 
and standardised effect sizes. On the other hand, for 
trials with few clusters (say 10 or 20 per arm), minimum 
detectable differences become large. So, for example for 
continuous outcomes, with say 10 clusters per arm and 
an ICC in the region of 0.02, then the MDD is in the 
region of 0.2 standardised effect sizes (Figure 3). For 
binary outcomes (Figure 4) with 10 clusters per arm and 
ICC in the region of 0.02 the minimum detectable dif- 
ference is in the region of about a 10 percentage point 
change (i.e. from about 15% to 25%). 

Example 

In a real example, a CRCT is to be designed to evaluate 
the effectiveness of lay support workers to promote 
breastfeeding initiation and sustainability until 6 weeks 
postpartum. Due to fears of contamination, whereby 
new mothers indivertibly gain access and support from 
the lay workers, the intervention is to be randomised 
over cluster units. Cluster randomisation will also 
ensure that the trial is logistically simpler to run, as ran- 
domisation will be carried out at a single point in time, 
and midwives will have the benefit of remaining in 
either the intervention or control arm for the duration 
of the trial. The cluster units to be used are midwifery 
teams, which are teams of midwives who visit a set 
number of primary care general practices to deliver 
antenatal and postnatal care. The trial is to be carried 
out within a single primary care trust within the West 
Midlands. The nature of this design therefore means 
that the number of clusters available is fixed at the 
number of midwifery teams delivering care within the 
region. 

At the time of designing the trial, current breastfeed- 
ing rates, at 6 weeks postpartum, in the region were 
around 40%. National targets had been set to encourage 
all regions to increase rates to around 50%. It was 
known that 40 clusters are available (i.e. there are 40 
midwifery teams within the region), so that the number 
of clusters per arm was fixed to k = 20. Estimates of 
ICC range from 0.005 to 0.07 in similar trials [6,7]. 

Firstly, the feasibility check is implemented to deter- 
mine whether the 20 available clusters per arm are suffi- 
cient to detect the 10 percentage point change assuming 
the lower estimated ICC (0.005). Where the power is set 
at 80%, the required sample size per arm to detect an 
increase in percentages from 40% to 50%, under indivi- 
dual randomisation, is «/ = 385 (Table 1). When multi- 
plied by the ICC this gives 385 x 0.005 = 1.925 which is 
less than k = 20. This therefore means that 20 clusters 
per arm will be sufficient for this design (provided an 
adequate number of individuals are recruited in each 
cluster). A similar design with 90% power would require 
519 individuals per arm using individual randomisation. 
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Figure 1 Maximum achievable power for various different standardised effect sizes: limiting values as the cluster size approaches 
infinity 



Again, because 515 x 0.005 = 2.57 < 20, this also means 
that 20 clusters per arm will be sufficient to detect an 
increase from 40% to 50% with 90% power (again pro- 
vided an adequate number of individuals are recruited 
in each cluster). Equation 14 shows that under the 
assumption that p = 0.005, either 22 or 30 individuals 
will be required per cluster (for 80% and 90% power 
respectively). 

Secondly, the feasibility check is evaluated to deter- 
mine whether the 20 available clusters per arm is suffi- 
cient to detect the 10 percentage point change assuming 
the higher estimated ICC (0.07). However, in this case 
as 385 x 0.07 = 26.95 >20, so the condition is not met 
at the 80% power level (and so neither at the 90% 
power level). Therefore, 20 clusters per arm is not a suf- 
ficient number of clusters, however many individuals are 
included within each cluster, to detect the required 
effect size at the pre-specified power and significance. 

Since this latter design is not feasible, formulae at 
equation 25 allow determination of the minimum 
detectable difference (or maximum achievable power 



from equation 27). For a cluster trial with 80% power, 
and assuming a baseline event rate of Ti\ = 0.40, the 
minimum detectable difference is 0.12 (to 2 d.p.). That 
is, a change from 40% to 52%. To detect a change from 
40% to 52% with 80% power, 189 individuals would be 
required per cluster. For a trial with 90% power, the 
minimum detectable difference is 0.14 (i.e. a change 
from 40% to 54%). To detect a change from 40% to 54% 
with 90% power, 146 individuals would be required per 
cluster. 

Discussion 

In health care service evaluation cluster RCTs, pre-spe- 
cifying the numbers of clusters available, are frequently 
used. That is, trials are designed based on a limited 
number of cluster units (e.g. GP practices) willing or 
able to participate [6,7,9,10]. In contrast, sample size 
methods are almost exclusively based on pre-specified 
average cluster sizes, as opposed to number of clusters 
available [1,4]. Whilst mapping sample size formulae 
from one method to the other is straightforward, a limit 
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Figure 2 Maximum achievable power to detect increases in 10 percentage points for various different baseline proportions (jt,): 
limiting values as the cluster size approaches infinity 



on the precision of estimates in such designs leads to a 
maximum available power (that is, a limit on the power 
available irrespective of how large the clusters are) and 
minimum detectable differences (that is, a limit on the 
difference detectable irrespective of how large the clus- 
ters are). 

For example, with just 15 clusters available per arm 
and an ICC of 0.05, power achievable for a trial aiming 
to detect an increase in percentage change from 40% to 
50% is limited to about 62%, irrespective of how large 
the clusters are made. Cluster trials with just 15 clusters 
available per arm are not uncommon and a 10 percen- 
tage point change not an unrealistic goal in many set- 
tings. However, power levels as low as 60% are clearly 
sub-optimal, and might not be regarded as sufficiently 
high to warrant the costs of a clinical trial. Formulae 
provided here for minimum detectable differences show 
that to retain a power level in the region of 80%, trial- 
lists would have to be content with detecting a differ- 
ence above a twelve percentage point change. Re- 
formulation of the problem in terms of minimum 



detectable difference can thus be used to compare the 
difference which is statistically detectable (at acceptable 
power levels) to that which is clinically, or managerially, 
important. 

Should the situation arise in which the postulated ICC 
suggests that it is not possible to detect the required dif- 
ference (at pre-specified power), it might be tempting to 
lower the estimated ICC. Such an approach should be 
strongly discouraged, since loss of power will most likely 
result, potentially leading to a non-significant finding 
[12]. Rather, formulae here allow sensitivity of the 
design to be explored in light of possible variations in 
the ICC. However, other avenues to increase available 
power might reasonably be considered. For example, it 
may be plausible to consider relaxing alpha and even to 
set alpha and beta equivalent [17]. Or alternatively, 
incorporating prior information in a Bayesian framework 
may lead to increases in power. It might further be 
argued that studies of limited power are of importance 
as they contribute to the evidence framework by ulti- 
mately becoming part of future systematic reviews [18], 



Hemming et al. BMC Medical Research Methodology 201 1, 11:102 
http://www.biomedcentral.eom/1 471-2288/1 1 /1 02 



Page 9 of 1 1 



5 

u 

It 




: 


:: 


.04 .06 


.08 .1 






ICC 








No. Clusters 






10 


- 20 30 


50 100 
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and the methods presented here thus allow for the 
achievable power to be computed. Before-and-after type 
studies offer a further avenue of exploration, as by their 
very nature induce smaller intra-cluster correlations. 

Methodological limitations of the work presented here 
include the assumption of equal sized arms; equal stan- 
dard deviations; Normality assumptions (which might 
not be tenable for small numbers of clusters as well as 
small numbers of individuals); and lack of continuity 
correction for binary variables. Furthermore, CRCTs 
with a small number of clusters are controversial, pri- 
marily because the small number of units randomised 
open results to the possibility of bias and approxima- 
tions to Normality become questionable. However, 
despite this, CRCTs with a small number of clusters are 
frequently reported. The Medical Research Council, for 
instance, has issued guidelines that cluster trials with 
fewer than 5 clusters per arm are inadvisable [19]. 
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Table 1 Estimates of the Minimum Detectable Difference 
(MDD) for trial with 20 clusters per arm, to detect an 
increase in an event rate from 40% 
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Others have considered some of the issues involved in 
community based intervention trials with a small num- 
ber of clusters, but have focused on issues of restricted 
randomisation and whether the analysis should be at the 
individual or cluster level [20]. 

Conclusions 

Evaluations of health service interventions using CRCTs, 
are frequently designed with a limited available number 
of clusters. Sample size formulae for CRCTs, are almost 
exclusively evaluated as a function of the average cluster 
size. Where no formal limits exist on the number of 
individuals enrolled within each cluster, increasing the 
numbers of individuals leads to a limited increase in the 
study power. This in turn means that for a trial with a 
fixed number of clusters, some designs will not be feasi- 
ble, and we have provided simple guidelines to evaluate 
feasibility. A simple rule is that the number of clusters 
(k) will be sufficient provided: 

k > rii x p. 

For infeasible designs to retain acceptable levels of 
power, detectable difference might not be as small as 
desired, leading to the notion of a minimum detectable 
difference. Useful aidese memoires are that the detect- 
able difference in a CRCT is that of an individual RCT 
inflated by the square root of the variance inflation fac- 
tor; and the power is that under individual randomisa- 
tion with the standardised effect size deflated by the 
square root of the variance inflation factor. A STATA 
function, clusterSampleSize.ado, allows practical imple- 
mentation of all formulae discussed here and is available 
from the author. 
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